Dynamic Teaching in Sequential Decision Making Environments
نویسندگان
چکیده
We describe theoretical bounds and a practical algorithm for teaching a model by demonstration in a sequential decision making environment. Unlike previous efforts that have optimized learners that watch a teacher demonstrate a static policy, we focus on the teacher as a decision maker who can dynamically choose different policies to teach different parts of the environment. We develop several teaching frameworks based on previously defined supervised protocols, such as Teaching Dimension, extending them to handle noise and sequences of inputs encountered in an MDP. We provide theoretical bounds on the learnability of several important model classes in this setting and suggest a practical algorithm for dynamic teaching.
منابع مشابه
Sequential forecasting and decision making in dynamic and incomplete environments
In many real-world data analysis problems observations arrive sequentially in time and it is required to perform inference on-line. Sequential learning provides us with techniques to fuse information, learn policies, analyse risks, forecast outcomes and make decisions in such a way that a current model is updated as new information becomes available. This framework of sequential learning is par...
متن کاملConvergence in a sequential two stages decision making process
We analyze a sequential decision making process, in which at each stepthe decision is made in two stages. In the rst stage a partially optimalaction is chosen, which allows the decision maker to learn how to improveit under the new environment. We show how inertia (cost of changing)may lead the process to converge to a routine where no further changesare made. We illustrate our scheme with some...
متن کاملUniversity of Alberta NEW REPRESENTATIONS AND APPROXIMATIONS FOR SEQUENTIAL DECISION MAKING UNDER UNCERTAINTY
This dissertation research addresses the challenge of scaling up algorithms for sequential decision making under uncertainty. In my dissertation, I developed new approximation strategies for planning and learning in the presence of uncertainty while maintaining useful theoretical properties that allow larger problems to be tackled than is practical with exact methods. In particular, my research...
متن کاملSequential Decision Making in Non-stochastic Environments
Sequential Decision Making in Non-stochastic Environments
متن کاملMatrix Sequential Hybrid Credit Scorecard Based on Logistic Regression and Clustering
The Basel II Accord pointed out benefits of credit risk management through internal models to estimate Probability of Default (PD). Banks use default predictions to estimate the loan applicants’ PD. However, in practice, PD is not useful and banks applied credit scorecards for their decision making process. Also the competitive pressures in lending industry forced banks to use profit scorecards...
متن کامل